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AN  OVERVIEW  OF  COMPUTER-BASED 
NATURAL  LANGUAGE  PROCESSING* 

Preface 

Computer-based  Natural  Language  Processing  (NLP)  is  the 
key  to  enabling  humans  and  their  computer-based  creations  to 
interact  with  machines  in  natural  languages  like  English  and 
Japanese  (in  contrast  to  formal  computer  languages).  The 
doors  that  such  an  achievement  can  open  have  made  this  a major 
research  area  in  Artificial  Intelligence  and  Computational  Linguistics. 
Commercial  natural  languages  interfaces  to  computers  have  recently 
entered  the  market  and  the  future  looks  bright  for  other  applications 
as  well. 

This  report  reviews  the  basic  approaches  to  such  systems, 
the  techniques  utilized,  applications,  the  state-of-the-art 
of  the  technology,  issues  and  research  requirements,  the  major 
participants,  and  finally,  future  trends  and  expectations. 

It  is  anticipated  that  this  report  will  prove  useful  to 
engineering  and  research  managers,  potential  users,  and  others 
who  will  be  affected  by  this  field  as  it  unfolds. 


* This  report  is  part  of  the  NBS/NASA  series  of  overview 
reports  on  Artificial  Intelligence  and  Robotics. 
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NATURAL  LANGUAGE  PROCESSING 


A.  Introduction 

One  major  goal  of  Artificial  Intelligence  (AI)  research  has  been  to 
develop  the  means  to  interact  with  machines  in  natural  language  (in 
contrast  to  a computer  language).  The  interaction  may  be  typed, 
printed  or  spoken.  The  complementary  goal  has  been  to  understand 
how  humans  communicate.  The  scientific  endeavor  aimed  at  achieving 
these  goals  has  been  referred  to  as  computational  linguistics  (or 
more  broadly  as  cognitive  science),  an  effort  at  the  intersection 
of  AI,  linguistics,  philosophy  and  psychology. 

Human  communication  in  natural  language  is  an  activity  of  the 
whole  intellect.  AI  researchers,  in  trying  to  formalize  what 
is  required  to  properly  address  natural  language  find  themselves 
involved  in  the  long  term  endeavor  of  having  to  come  to  grips 
with  this  whole  acitivity.  (Formal  linguists  tend  to  restrict 
themselves  to  the  structure  of  language.)  The  current  AI  approach 
is  to  conceptualize  language  as  a knowledge-based  system  for 
processing  communications  and  to  create  computer  programs  to 
model  that  process. 

Communications  acts  can  serve  many  purposes,  depending  on  the 
goals,  intentions  and  strategies  of  the  communicator.  One  goal 
of  the  communication  is  to  change  some  aspect  of  the  recipient's 
mental  state.  Thus,  communication  endeavors  to  add  or  modify 
knowledge,  change  a mood,  elicit  a response  or  establish  a new 
goal  for  the  recipients. 
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For  a computer  program  to  interpret  a relatively  unrestricted 
natural  language  communication,  a great  deal  of  knowledge  is 
required.  Knowledge  is  needed  of: 

-the  structure  of  sentences 

-the  meaning  of  words 

-the  morphology  of  words 

-a  model  of  the  beliefs  of  the  sender 

-the  rules  of  conversation,  and 

-an  extensive  shared  body  of  general  information  about  the 
wor  1 d . 


This  body  of  knowledge  can  enable  a computer  (like  a human)  to  use 
expect  at i on -dr i ven  processing  in  which  knowledge  about  the  usual 
properties  of  known  objects,  concepts,  and  what  typically  happens 
in  situations,  can  be  used  to  understand  incomplete  or  ungrammatical 
sentences  in  appropriate  contexts. 

Thus,  Barrow  (1979, p. 12)  observes: 


In  current  attempts  to  handle  natural  language,  the  need  to 
use  knowledge  about  the  subject  matter  of  the  conversation, 
and  not  just  grammatical  niceties,  is  recognized--it  is 
now  believed  that  reliable  translation  is  not  possible 
without  such  knowledge.  It  is  essential  to  find  the  best 
interpretation  of  what  is  uttered  that  is  consistent  with 
all  sources  of  knowledge  — lexical,  grammatical,  semantic 
(meaning),  topical,  and  contextual. 


2 


Arden  ( 1 980 , p .463)  adds: 

In  writing  a program  for  understanding  languages,  one  is 
faced  with  all  the  problems  of  artificial  intelligence, 
problems  of  coping  with  huge  amounts  of  knowledge,  of  finding 
ways  to  represent  and  describe  complex  cognitive  structures, 
as  well  as  finding  an  appropriate  structure  in  a gigantic 
space  of  possibilities.  Much  of  the  research  in  understanding 
natural  languages  is  aimed  at  these  problems. 

As  indicated  earlier,  natural  language  communication  between 
humans  is  very  dependent  upon  shared  knowledge,  models  of  the 
world,  models  of  the  individuals  they  are  communicating  with,  and 
the  purposes  or  goals  of  the  communication.  Because  the  listener 
has  certain  expectations  based  on  the  context  and  his  (or  her) 
models,  it  is  often  the  case  that  only  minimal  cues  are  needed 
in  the  communication  to  activate  these  models  and  determine 
the  meaning. 

The  next  section,  B,  briefly  outlines  applications  for 
natural  language  processing  (NLP)  systems.  Sections  C to  I 
review  the  technology  involved  in  constructing  such  systems, 
with  existing  NLP  systems  being  summarized  in  Section  J. 

The  state  of  the  art,  problems  and  issues,  research  requirements 
and  the  principle  participants  in  NLP  are  covered  in  Sections 
K through  N.  Section  0 provides  a forecast  of  future  developments. 

A glossary  of  terms  in  NLP  is  provided  at  the  back  of  this 
report.  Further  sources  of  information  are  listed  in  Section  P. 
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B . Applications: 


There  are  many  applications  for  computer-based  natural  language 
understanding  systems.  Some  of  these  are  listed  in  Table  I. 
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Discourse 


Speech  Understanding 
Story  Understanding 
Information  Access 

Information  Retrieval 
Question  Answering  Systems 
Computer-Aided  Instruction 
Information  Acquisition  or  Transformation 
Machine  Translation 
Document  or  Text  Understanding 
Automatic  Paraphrasing 
Knowledge  Compilation 
Knowledge  Acquisition 
Interaction  with  Intelligent  Programs 
Expert  Systems  Interfaces 
Decision  Support  Systems 
Explanation  Modules  For  Computer  Actions 
Interactive  Interfaces  to  Computer  Programs 
Interacting  with  Machines 

Control  of  Complex  Machines 
Language  Generation 

Document  or  Text  Generation 
Speech  Output 

Writing  Aids:  e.g.,  grammar  checking 
TABLE  I Some  Applications  of  Natural  Language  Processing 
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C.  Approach: 


Natural  Language  Processing  (NLP)  systems  utilize  both  linguistic 
knowledge  and  domain  knowledge  to  interpret  the  input.  As  domain 
knowledge  (knowledge  about  the  subject  area  of  the  communication) 
is  so  important  to  understanding,  it  is  usual  to  classify  the 
various  systems  based  on  their  representation  and  utilization 
of  domain  knowledge.  On  this  basis,  Hendrix  and  Sacerdoti  (1981) 
classify  systems  as  Types  A,  B or  C*,  with  Type  A being  the  simplest, 
least  capable  and  correspondingly  least  costly  systems. 

1 . Type  A;  No  World  Models 

a . Key  Words  or  Patterns 

The  simplest  systems  utilize  ad  hoc  data  structures  to 
store  facts  about  a limited  domain.  Input  sentences  are  scanned 
by  the  programs  for  predeclared  key  words,  or  patterns,  that 
indicate  known  objects  or  relationships.  Using  this  approach, 
early  simple  template-based  systems,  while  ignoring  the 
complexities  of  language,  sometimes  were  able  to  achieve 
impressive  results.  Usually,  heuristic  empirical  rules  were 
used  to  guide  the  interpretations. 

b . Limited  Logic  Systems 

In  limited  logic  systems,  information  in  their  data  base  was 
stored  in  some  formal  notation,  and  language  mechanisms  were 

*0ther  system  classifications  are  possible,  e.g.,  those  based 
on  the  range  of  syntactic  coverage. 
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utilized  to  translate  the  input  into  the  internal  form.  The 
internal  form  chosen  was  such  as  to  facilitate  performing  logical 
inferences  on  information  in  the  data  base. 

2 . Type  B:  Systems  That  Use  Explicit  World  Models 

In  these  systems,  knowledge  about  the  domain  is  explicitly  encoded, 
usually  in  frame  or  network  representations  (discussed  in  a later 
section)  that  allow  the  system  to  understand  input  in  terms  of 
context  and  expectations.  Cullinford's  work  (see  Schank  and 
Ableson,  1977)  on  SAM  (Script  Applier  Mechanism)  is  a good  example 
of  this  approach. 

3 . Type  C:  Systems  that  Include  Information  about  the  Goals 
and  Beliefs  of  Intelligent  Entities. 

These  advanced  systems  (still  in  the  research  stage)  attempt 

to  include  in  their  knowledge  base  information  about  the  beliefs 

and  intentions  of  the  participants  in  the  communication.  If 

the  goal  of  the  communication  is  known,  it  is  much  easier  to  interpret 

the  message.  Schank  and  Abelson's  (1977)  work  on  plans  and  themes 

reflects  this  approach. 
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D . The  Parsing  Problem 


For  more  complex  systems  than  those  based  on  key  words  and  pattern 
matching,  language  knowledge  is  required  to  interpret  the  sentences. 

The  system  usually  begins  by  "parsing"  the  input  (processing 
an  input  sentence  to  produce  a more  useful  representation  for 
further  analysis).  This  representation  is  normally  a structural 
description  of  the  sentence  indicating  the  relationships  of 
the  component  parts.  To  address  the  parsing  problem  and  to  interpret 
the  result,  the  computational  linguistic  community  has  studied 
syntax,  semantics,  and  pragmatics.  Syntax  is  the  study  of  the 
structure  of  phrases  and  sentences.  Semantics  is  the  study  of  meaning. 
Pragmatics  is  the  study  of  the  use  of  language  in  context. 
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E.  Grammars 


Barr  and  Feigenbaum  (1981,  p.  229),  state,  "A  grammar  of  a language 
is  a scheme  for  specifying  the  sentences  allowed  in  the  language, 
indicating  the  syntactic  rules  for  combining  words  into  well- 
formed  phrases  and  clauses."  The  following  grammars  are  some 
of  the  most  important.* 

1 . Phrase  Structure  Grammar  - Context  Free  Grammar 

Chomsky  (see,  e.g.,  Winograd,  1983)  had  a major  impact  on  linguistic 

research  by  devising  a mathematical  approach  to  language.  He 

defined  a series  of  grammars  based  on  rules  for  rewriting  sentences 

into  their  component  parts.  He  designated  these  as,  0,1,2, 

or  3,  based  on  the  restrictions  associated  with  the  rewrite 

rules,  with  3 being  the  most  restrictive. 

Type  2--Context-Free  (CF)  or  Phrase  Structure  Grammar  (PSG)-- 
has  been  one  of  the  most  useful  in  natural -language  processing. 

It  has  the  advantage  that  all  sentence  structure  derivations 
can  be  represented  as  a tree  and  practical  parsing  algorithms 
exist.  Though  it  is  a relatively  natural  grammar,  it  is  unable 
to  capture  all  of  the  sentence  constructions  found  in  most  natural 
languages  such  as  English.  Gazder  (1981)  has  recently  broadened 
the  applicability  of  CF  PSG  by  adding  augmentations  to  handle 
situations  that  do  not  fit  the  basic  grammar.  This  generalized 
1 1 " ■■  ■■  ... 

Charniak  and  Wilks  (1976)  provide  a good  overview  of  the  various 
approaches . 
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Phrase  Structure  Grammar  is  now  being  developed  by  Hewlett  Packard 
( Gawron  et  al . , 1982  ) . 


2 . Transformational  Grammar 

Tennant  (1981, p89)  observes  that  "The  goal  of  a language  analysis 
program  is  recognizing  grammatical  sentences  and  representing 
them  in  a canonical  structure  (the  underlying  structure)."  A 
transformational  grammar  (Chomsky,  1957)  consists  of  a dictionary, 
a phrase  structure  grammar  and  a set  of  transformations.  In 
analyzing  sentences,  using  a phrase  structure  grammar,  first 
a parse  tree  is  produced.  This  is  called  the  surface  structure. 

The  transformational  rules  are  then  applied  to  the  parse  tree 
to  transform  it  into  a canonical  form  called  the  deep  (or  underlying) 
structure.  As  the  same  thing  can  be  stated  in  several  different 
ways,  there  may  be  many  surface  structures  that  translate  into 
a single  deep  structure. 

3 . Case  Grammar 

Case  Grammar  is  a form  of  Transformational  Grammar  in  which  the 
deep  structure  is  based  on  cases  - semantically  relevant  syntactic 
relationships.  The  central  idea  is  that  the  deep  structure  of  a 
simple  sentence  consists  of  a verb  and  one  or  more  noun  phrases 
associated  with  the  verb  in  a particular  relationship.  These 
semantically  relevant  relationships  are  called  cases.  Fillmore 
(1971)  proposed  the  following  cases:  Agent,  Experiencer,  Instrument, 

Object,  Source,  Goal,  Location,  Type  and  Path. 
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The  cases  for  each  verb  form  an  ordered  set  referred  to  as  a "case 
frame"  A case  frame  for  the  verb  "open"  would  be: 

(object  (instrument)  (agent)) 

which  indicates  that  open  always  has  an  object,  but  the  instrument 
or  agent  can  be  ommited  as  indicated  by  their  surrounding 
parentheses.  Thus  the  case  frame  associated  with  the  verb  provides 
a template  which  aids  in  understanding  a sentence. 

4 . Semantic  Grammars 

For  practical  systems  in  limited  domains,  it  is  often  more  useful, 
instead  of  using  conventional  syntactic  constituents  such  as 
noun  phrases,  verb  phrases  and  prepositions,  to  use  meaningful 
semantic  components  instead.  Thus,  in  place  of  nouns  when  dealing 
with  a naval  data  base,  one  might  use  ships,  captains,  ports 
and  cargos.  This  approach  gives  direct  access  to  the  semantics 
of  a sentence  and  substantially  simplifies  and  shortens  the 
processing.  Grammars  based  on  this  approach  are  referred  to 
as  semantic  grammars  (see,  e.g..  Burton,  1976). 

5 . Other  Grammars 

A variety  of  other,  but  less  prominent,  grammars  have  been  devised. 
Still  others  can  be  expected  to  be  devised  in  the  future.  One 
example  is  Montague  Grammar  (Dowty  et  al.,  1981)  which  uses 
a logical  functional  representation  for  the  grammar  and  therefore 
is  well  suited  for  the  parallel-processing  logical  approach 
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now  being  pursued  by  the  Japanese  (see  Nishida  and  Doshita, 

1982)  for  their  future  AI  work  as  embodied  is  their  Fifth  Generation 
Computer  research  project. 
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F . Semantics  and  the  Cantankerous  Aspects  of  Language 


Semantic  processing  (as  it  tries  to  interpret  phrases  and  sentences) 
attaches  meanings  to  the  words.  Unfortunately,  English  does 
not  make  this  as  simple  as  looking  up  the  word  in  the  dictionary, 
but  provides  many  difficulties  which  require  context  and  other 
knowledge  to  resolve. 

1 . Multiple  Word  Senses 

Syntactic  analysis  can  resolve  whether  a word  is  used  as  a noun 
or  a verb,  but  further  analysis  is  required  to  select  the  sense 
(meaning)  of  the  noun  or  verb  that  is  actually  used.  For  example, 
"fly"  used  as  a noun  may  be  a winged  insect,  a fancy  fishhook, 
a baseball  hit  high  in  the  air,  or  several  other  interpretations 
as  well.  The  appropriate  sense  can  be  determined  by  context 
(e.g.,  for  "fly"  the  appropriate  domain  of  interest  could  be 
extermination,  fishing  or  sports),  or  by  matching  each  noun 
sense  with  the  senses  of  other  words  in  the  sentence.  This 
latter  approach  was  taken  by  Reiger  and  Small  (1979)  using  the 
(still  e mb r ionic)  technique  of  "interacting  word  experts",  and 
by  Fin  in  (1980)  and  McDonald  (1982)  as  the  basis  for  understanding 
noun  compounds. 

2 . Modifier  Attachment 

Where  to  attach  a prepositional  phrase  to  the  parse  tree  cannot 
be  determined  by  syntax  alone  but  requires  semantic  knowledge. 
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"Put  the  plant  in  the  box  on  the  table"  is  an  example  illustrating 
the  difficulties  that  can  be  encountered  with  prepositional 
phrases . 


3 .  Noun-Noun  Modification 

Choosing  the  appropriate  relationship  when  one  noun  modifies  another 
depends  on  semantics.  For  example,  for  "apple  vendor",  one's  knowledge 
tends  to  force  the  interpretation  "vendor  of  apples"  rather 
than  "an  apple  that  is  a vendor." 


4 .  Pronouns 

Pronouns  allow  a simplified  reference  to  previously  used  (or 
implied)  nouns,  sets  or  events.  Where  feasible,  using  pragmatics, 
pronoun  antecedents  are  usually  identified  by  reference  to  the 
most  recent  noun  phrase  having  the  same  context  as  the  pronoun. 


5 .  Ellipsis  and  Substitution 

Ellipsis  is  the  phenomenon  of  not  stating  explicitly  some  words  in 
a sentence,  but  leaving  it  to  the  reader  or  listener  to  fill  them  in. 
Substitution  is  simi lar--using  a dummy  word  in  place  of  the  ommitted 
words.  Employing  pragmatics,  ellipses  and  substitutions  are 
usually  resolved  by  matching  the  incomplete  statement  to  the 
structures  of  previous  recent  sentences --fi ndi ng  the  best  partial 
match  and  then  filling  in  the  rest  from  this  matching  previous 
structure . 
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6.  Other  Difficulties 


In  addition  to  those  just  mentioned,  there  are  other  difficulties, 
such  as  anaphoric  references,  amb iguous  noun  groups,  adjectivals, 
and  incorrect  language  usuage. 

6 . Knowledge  Representation* 

As  the  AI  approach  to  natural  language  processing  is  heavily 
knowledge  based,  it  is  not  surprising  that  a variety  of  knowledge 
representation  (KR)  techniques  have  found  their  way  into  the 
field.  Some  of  the  more  important  ones  are: 

1.  Procedural  Representations  -The  meanings  of  words  or  sentences 
being  expressed  as  computer  programs  that  reason  about  their 
meaning . 

2.  Declarative  Representations 

a.  Logic  - Representation  in  First  Order  Predicate  Logic, 
for  example. 

b.  Semantic  Networks  - Representations  of  concepts  and 
relationships  between  concepts  as  graph  structures 
consisting  of  nodes  and  labeled  connecting  arcs. 

3.  Case  Frames  - (covered  earlier) 


*More  complete  presentations  on  KR  can  be  found  in  Chapter  III 
of  Barr  and  Feigenbaum  (1981),  and  in  Gevarter  (1983). 
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4.  Conceptual  Dependency  - This  approach  (related  to  case  frames) 
is  an  attempt  to  provide  a representation  of  all  actions  in 

terms  of  a small  number  of  semantic  primitives  into  which  input 
sentences  are  mapped  (see,  e.g.,  Schank  and  Riesbeck,  1981). 

The  system  relies  on  11  primitive  physical,  instrumental  and 
mental  ACT'S  (propel,  grasp,  speak,  attend,  P trans,  A trans, 
etc.),  plus  several  other  categories  or  concept  types. 

5.  Frame  - A complex  data  structure  for  representing  a whole 
situtation,  complex  object  or  series  of  events.  A frame  has 
slots  for  objects  and  relations  appropriate  to  the  situation. 

6.  Scripts  - Frame-like  data  structures  for  representing 
stereotyped  sequences  of  events  to  aid  in  understanding  simple 
stories . 
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H.  Syntactic  Parsing 


Parsing  assigns  structures  to  sentences.  The  following  types 
have  been  developed  over  the  years  for  NLP.  (Barr  and  Feigenbaum, 
1981)  . 

1.  Template  Matching:  Most  of  the  early  (and  some  current) 

NL  programs  performed  parsing  by  matching  their  input  sentences 
against  a series  of  stored  templates. 

2 . Transition  Nets: 

Phrase  structure  grammars  can  be  syntactically  decomposed  using 
a set  of  rewrite  rules  such  as  indicated  in  Figure  1.  Observe 
that  a simple  sentence  can  be  rewritten  as  a Noun  Phrase  and 
a Verb  Phrase  as  indicated  by: 

S NP  VP 

The  noun  phrase  can  be  rewritten  by  the  rule 

NP  -9-  (DET)(ADJ*)N(PP*) 

where  the  parentheses  indicate  that  the  item  is  optional,  while 
the  asterisk  indicates  that  any  number  of  the  items  may  occur. 

The  items,  if  they  appear  in  the  sentence,  must  occur  in  the  order 
shown.  The  following  example  shows  how  a noun  phrase  can  be 
analyzed . 


NP  DET  ADJ  N P\L 

The  large  satellite  in  the  sky-*.  The  large  satellite  in  the  sky 

where  PP  is  a prepositional  phrase. 
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GRAMMAR 

S NP  VP 

NP  ► (DET)  (ADJ*)  N (PP*) 

PP  ► PREP  NP  . 

VP  ► VTRAN  NP 

Figure  1.  A Transition  Network  for  a Small  Subset  of  English.  Each  diagram  represents  a rule  for 
finding  the  corresponding  word  pattern.  Each  rule  can  call  on  other  rules  to  find  needed  patterns. 

After  Graham  (1979,  p214.) 
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Thus,  the  parser  examines  the  first  word  to  see  if  it  corresponds 
to  its  list  of  determiners  (the,  a,  one,  every,  etc.)  If  the  first 
word  is  found  to  be  a determiner,  the  parser  notes  this  and 
proceeds  on  to  the  next  word,  otherwise  it  checks  to  see  if 
the  first  word  is  an  adjective,  and  so  forth.  If  a preposition 
is  encountered  in  the  sentence,  the  parser  calls  the  prepositional 
phrase  (PP)  rule. 

A NP  transition  network  is  shown  as  the  second  diagram  in  Figure 
1,  where  it  starts  in  the  initial  state  (4)  and  moves  to  state 

(5)  if  it  finds  a determiner  or  an  adjective,  or  on  to  state 

(6)  when  a noun  is  found.  The  loops  for  ADJ  and  PP  indicate 
that  more  than  one  adjective  or  prepositional  phrase  can 

occur.  Note  that  the  PP  rule  can  in  turn  call  a NP  rule,  resulting 
in  a nested  structure.  An  example  of  an  analyzed  noun  phrase 
is  shown  in  Figure  2. 

As  the  transition  networks  analyze  a sentence,  they  can  collect 

information  about  the  word  patterns  they  recognize  and  fill 

slots  in  a frame  associated  with  each  pattern.  Thus,  they  can  identify 

noun  phrases  as  singular  or  plural,  whether  the  nouns  refer 

to  persons  and  if  so  their  gender,  etc.,  needed  to  produce 

a deep  structure.  A simple  approach  to  collecting  this  information 

is  to  attach  subroutines  to  be  called  for  each  transition.  A 

transition  network  with  such  subroutines  attached  is  called 

an  "augmented  transition  network",  or  ATN.  With  ATN's,  word 

patterns  can  be  recognized.  For  each  word  pattern,  we  can  fill 
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NP 

** 

The  payload  on  a tether  under  the  shuttle. 
DET  N PP 


The  payload  on  a tether  under  the  shuttle. 
PREP  NP 


on  a tether  under  the  shuttle. 


DET  N 


PP 


a tether  under  the  shuttle. 

PREP  NP 
under  the  shuttle. 

DET  N 
the  shuttle. 


Figure  2a.  Example  Noun  Phrase  Decomposition 


NP 


Figure  2b.  Parse  Tree  Representation  of  the  Noun  Phrase  Surface  Structure. 
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slots  in  a frame.  The  resulting  filled  frames  provide  a basis 
for  futher  processing. 
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3.  Other  Parsers 


Other  parsing  approaches  have  been  devised,  but  ATN's  remain 
the  most  popular  syntactic  parsers.  ATN's  are  topdown  parsers 
in  that  the  parsing  is  directed  by  an  anticipated  sentence  structure. 
An  alternative  approach  is  bottom -up  parsing,  which  examines 
the  input  words  along  the  string  from  left  to  right,  building 
up  all  possible  structures  to  the  left  of  the  current  word  as 
the  parser  advances.  A bottom-up  parser  could  thus  build  many 
partial  sentence  structures  that  are  never  used,  but  the  diversity 
could  be  an  advantage  in  trying  to  interpret  input  word  strings 
that  are  not  clearly  delineated  sentences  or  contain  ungrammatical 
constructions  or  unknown  words.  There  have  been  recent  attempts 
to  combine  the  top-down  with  the  bottom-up  approach  for  NLP 
in  a similar  manner  as  has  been  done  for  Computer  Vision  (see, 
e .g . , Gevarter , 1982 ) . 

For  a recent  overview  of  parsing  approaches  see  Slocum  (1981). 
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I . Semantics,  Parsing  and  Understanding 


The  role  of  syntactic  parsing  is  to  construct  a parse  tree  or 
similar  structure  of  the  sentence  to  indicate  the  grammatical 
use  of  the  words  and  how  they  are  related  to  each  other.  The 
role  of  the  semantic  processing  is  to  establish  the  meaning  of  the 
sentence.  This  requires  facing  up  to  all  the  cantankerous  ambigu- 
ities discussed  earlier. 

In  natural  languages  (unlike  restricted  languages,  e.g.,  semantic 
grammars)  it  is  often  difficult  to  parse  the  sentences  and  hook 
phrases  into  the  proper  portion  of  the  parse  tree,  without  some 
knowledge  of  the  meaning  of  the  sentence.  This  is  especially 
true  when  the  discourse  is  ungrammatical.  Therefore,  it  has 
been  suggested  (see,  e.  g.  Charniak,  1981)  that  semantics  be 
used  to  help  guide  the  path  of  the  syntactic  parser.  For  that 
case,  syntax  presses  ahead  as  far  as  it  can  and  then  hands  off 
its  results  to  the  semantic  portion  to  resolve  the  ambiguities. 

Woods  (1980)  has  extended  ATN  grammars  for  this  purpose.  Barr 
and  Feigenbaum  (1981,  p.  257)  indicate  that  present  language 
understanding  systems  are  indeed  tending  toward  the  use  of  multiple 
sources  of  knowledge  and  are  intermixing  syntactics  and  semantics. 

Charniak  (1981)  indicates  that  there  have  been  two  main  lines 
of  attack  on  word  sense  ambiguity.  One  is  the  use  of  discrimination 
nets  (Reiger  and  Small,  1979)  that  utilize  the  syntactic  parse 
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tree  (by  observing  the  grammatical  role  that  the  word  plays, 
such  as  taking  a direct  object,  etc.)  in  helping  to  decide  the 
word  sense.  The  other  approach  is  based  on  the  frame/script 
idea  (used,  e.g.,  for  story  comprehension)  that  provides  a context 
and  the  expected  sense  of  the  word  (see  e.g.,  Schank  and  Abelson, 
1977)  . 


Another  approach  is  "preference  semantics"  (Wilks,  1975)  which 
is  a system  of  semantic  primitives  through  which  the  best  sense 
in  context  is  determined.  This  system  uses  a lexicon  in  which 
the  various  senses  of  the  words  are  defined  in  terms  of  semantic 
primitives  (grouped  into  entities,  actions,  cases,  qualifiers, 
and  type  indicators).  Representation  of  a sentence  is  in  terms 
of  these  primitives  which  are  arranged  to  relate  agents,  actions 
and  objects.  These  have  preferential  relations  to  each  other. 
Wilks  approach  finds  the  match  that  best  satisfies  these 
preferences . 

Charniak  indicates  that  the  semantics  at  the  level  of  the  word 
sense  is  not  the  end  of  the  parsing  process,  but  what  is  desired 
is  understanding  or  comprehension  (associated  with  pragmatics). 
Here  the  use  of  frames,  scripts  and  more  advanced  topics  such 
as  plans,  goals,  and  knowledge  structures  (see,  e.g.  Schank 
and  Riesbeck,  1981)  play  an  important  role. 
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J . NLP  Systems 


As  indicated  below,  various  NLP  systems  have  been  developed 
for  a variety  of  functions. 


1 . Kinds 

a . Question  Answering  Systems 

Question  answering  natural  language  systems  have  perhaps 
been  the  most  popular  of  the  NLP  research  systems.  They  have 
the  advantage  that  they  usually  utilize  a data-base  for  a limited 
domain  and  that  most  of  the  user  discourse  is  limited  to  questions. 

b . Natural  Language  Interfaces  (NLI's) 

These  systems  are  designed  to  provide  a painless  means 
of  communicating  questions  or  instructions  to  a complex  computer 
program . 


c .  Computer-Aided  Instruction  (CAI) 


Aren  (1980,  p.  465)  states: 

One  type  of  interaction  that  calls  for  ability  in  natural 
languages  is  the  interaction  needed  for  effective  teaching 
machines.  Advocates  of  computer-aided  instruction  have 
embraced  numerous  schemes  for  putting  the  computer  to  use 
directly  in  the  educational  process.  It  has  long  been 
recognized  that  the  ultimate  effectiveness  of  teaching 
machines  is  linked  to  the  amount  of  intelligence  embodied 
in  the  programs.  That  is,  a more  intelligent  program  would 
be  better  able  to  formulate  the  questions  and  presentations 
that  are  most  appropriate  at  a given  point  in  a teaching 
dialogue,  and  it  would  be  better  equipped  to  understand  a 
student's  response,  even  to  analyze  and  model  the  knowledge 
state  of  the  student,  in  order  to  tailor  the  teaching  to 
his  needs.  Several  researchers  have  already  used  the  teaching 
dialogue  as  the  basis  for  looking  at  natural  languages 
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and  reasoning.  For  example,  the  SCHOLAR  system  of  Carbonell 
and  Collins  tutors  students  in  geography,  doing  complex 
reasoning  in  deciding  what  to  ask  and  how  to  respond  to 
a question.  Meanwhile,  SOPHIE,  teaches  electronic  circuits 
by  integrating  a natural  - 1 anguage  component  with  a specialized 
system  for  simulating  circuit  behavior.  Although  these 
systems  are  still  too  costly  for  general  use,  they  will 
almost  certainly  be  developed  further  and  become  practical 
in  the  near  future. 


d . Discourse 

Systems  that  are  designed  to  understand  discourse  (extended 
dialogue)  usually  employ  pragmatics.  Pragmatic  analysis  requires 
a model  of  the  mutual  beliefs  and  knowledge  held  by  the  speaker 
and  listener. 


e . Text  Understanding 

Though  Schank  (see  Schank  and  Riesbeck,  1981)  and  others 
have  addressed  themselves  to  this  problem,  much  more  remains 
to  be  done.  Techniques  for  understanding  printed  text  include 
scripts  and  causative  approaches. 

Arden  (1980,  pp.  465-466)  states: 

To  understand  a text,  a system  needs  not  only  a knowledge 
of  the  structure  of  the  language  but  a body  of  "world 
knowledge"  about  the  domain  discussed  in  the  text.  Thus 
a comprehensive,  text -underst andi ng  system  presupposes 
an  extensive  reasoning  system,  one  with  a base  of  common- 
sense  and  domain-specific  knowledge. 

The  problem  of  "understanding  a piece  of  text  does,  however, 
serve  as  a basic  framework  for  current  research  in  natural 
languages.  Programs  are  written  which  accept  text  input 
and  illustrate  their  understanding  of  it  by  answering 
questions,  giving  paraphrases,  or  simply  providing  a blow- 
by-blow  account  of  the  reasoning  that  goes  on  during  the 
analysis.  Generally,  the  programs  operate  only  on  a small 
preselected  set  of  texts  created  or  chosen  by  the  author 
for  exploring  a small  set  of  theoretical  problems. 
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F.  Text  Generation: 


There  are  two  major  aspects  of  text  generation,  one  is 
the  determination  of  the  content  and  textual  shape  of  the  message, 
the  second  is  transforming  it  into  natural  language.  There 
are  two  approaches  for  accomplishing  this.  The  first  is  indexing 
into  canned  text  and  combining  it  as  appropriate.  The  second 
is  generating  the  text  from  basic  considerations.  One  need  for 
text  generation  results  from  the  situation  in  which  information 
sources  need  to  be  combined  to  form  a new  message.  Unfortunately, 
simply  adjoining  sentences  from  different  contexts  usually  produces 
confusing  or  misleading  text.  Another  need  for  text  generation 
is  for  explanations  of  Expert  System  actions.  Text  generation 
will  become  particularly  important  as  data  bases  gradually  shift 
to  true  knowledge  bases  where  complex  output  has  to  be  presented 
linguistically.  McDonald's  thesis  (1980)  provides,  one  of  the 
most  sophisticated  approaches  to  text  generation. 

g . System  Building  Tools 

Recently,  computer  languages  and  programs  especially 
designed  to  aid  in  building  NLP  systems  have  begun  to  appear. 

An  example  is  OWL  developed  at  MIT  as  a semantic  network  knowledge 
representation  language  for  use  in  constructing  natural  language 
question  answering  systems. 
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2 . Research  NLP  Systems 

Until  recently,  virtually  all  of  the  NLP  systems  generated  were 
of  a research  nature.  These  NLP  systems  basically  were  aimed 
at  serving  five  functions: 

a.  Interfaces  to  Computer  Programs 

b.  Data  Base  Retrieval 

c.  Text  Understanding 

d.  Text  Generation 

e.  Machine  Translation 

A few  of  the  more  prominent  systems  are  briefly  reviewed  in 
this  section. 

a . Interfaces  to  Computer  Programs 
One  of  the  most  important  early  NLP  systems,  SHRDLU,  was  a complete 
system  combining  syntactic  and  semantic  processing.  This  system, 
designed  as  an  interface  to  a research  Blocks  World  simulation, 
is  described  in  Table  1 1 a . 

SOPHIE  (Table  1 1 b ) , a Computer-Aided  Instruction  (CAI)  system, 
made  use  of  a semantic  grammar  to  parse  the  input  and  to  provide 
instruction  based  on  a simulation  of  a power  supply  circuit. 

TDUS  (Table  lie)  uses  a procedural  network  (which  encodes  basic 
repair  operations)  to  interpret  a dialogue  with  an  apprentice 
engaged  in  repair  of  an  electro-mechanical  pump. 
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b . Natural  Language  Interfaces  to  Large  Data  Bases 
One  of  the  important  and  prominent  research  areas  for  NLP  is 
intelligent  front  ends  to  data  base  retrieval  systems.  LUNAR 
(Table  1 1 d ) is  one  of  the  most  often  cited  early  systems.  It 
utilized  a powerful  ATN  syntactic  parser  which  passed  on  its 
results  to  a semantic  analyzer. 

PLANES  (Table  lie)  was  a system  designed  as  a front  end  to 
the  Navy's  database  of  maintenance  and  flight  records  for  all 
naval  aircraft.  This  semantic-grammar-based  system  ignores 
the  sentences's  syntax,  searching  instead  for  meaningful 
semantic  constituents  by  using  ATN  subnets.  These  subnets  include 
PLANETYPE,  TIME  PERIOD,  ACTION,  etc. 
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in  information  about  than, 


ROBOT  (Table  Ilf)  uses  an  ATN  syntactic  parser  followed  by  a 
semantic  analyzer  to  produce  a formal  query  language  representation 
of  the  input  sentence.  ROBOT  has  proved  to  be  very  versatile. 

LIFER/LADDER  (Table  Ilg)  uses  patterns  or  templates  to  interpret 
sentences.  It  employs  a semantic  (pragmatic)  grammar,  which 
greatly  simplifies  the  interpretation.  Can  handle  ellipses  and 
pronouns . 

c . Text  Understanding 

SAM  (Table  1 1 h ) is  a research  system  that  attempts  to  understand 
text  about  everyday  events.  Knowledge  is  encoded  in  frames 
called  scripts.  SAM  uses  an  English  to  Conceptual  Dependency 
parser  to  produce  an  internal  representation  of  the  story. 

PAM  (Table  1 1 i ) is  one  offspring  of  SAM.  PAM  understands  stories 
by  determining  the  goals  that  are  to  be  achieved  in  the  story. 

It  then  attempts  to  match  actions  of  the  story  with  methods 
that  it  knows  will  achieve  the  goals. 

d . Text  Generation 

Winograd  (1983)  indicates  that  the  difficult  problems  in  generation 
are  those  concerned  with  meaning  and  context  rather  than  syntax. 
Thus,  until  recently,  text  generation  has  thus  been  mostly  an 
outgrowth  of  portions  of  other  NLP  systems. 
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e . Machine  Translation: 


Though  machine  translation  was  the  first  attempt  at  NLP,  early 
failures  resulted  In  little  further  work  being  done  In  this 
area  until  recently. 

f . Current  Research  NLP  Systems 
Table  III  lists  NLP  Systems  currently  be  researched. 

3.  Commercial  Systems: 

The  commercial  systems  available  today  (together  with  their 
approximate  price)  are  listed  In  Table  IV.  Several  of  these 
systems  are  derivatives  of  the  research  NLP  systems  previously 
discussed. 
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TABLE  III 
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K.  State  of  the  Art 


It  Is  now  feasible  to  use  computers  to  deal  with  natural  language 
Input  In  highly  restricted  contexts.  However,  Interacting  with 
people  In  a facile  manner  Is  still  far  off,  requiring  understanding 
of  where  people  are  coming  from  - their  knowledge,  goals  and  moods. 

In  today's  computing  environment,  the  only  systems  that  perform 
robustly  and  efficiently  are  Type  A system$--those  that  do  not 
use  explicit  world  models,  but  depend  on  key  word  or  pattern 
matching  and/or  semantic  grammars.  In  actual  working  systems, 
both  understanding  and  text  generation,  ATN -like  grammars  can 
be  considered  the  state  of  the  art. 


48 


L. 


Problems  and  Issues 


1 . How  People  Use  Language 

Many  of  the  issues  in  natural  language  understanding 
center  around  the  way  people  use  language.  Given  speech  acts 
can  serve  many  purposes,  depending  on  the  goals,  intentions 
and  strategies  of  the  speaker.  Thus,  methods  for  determining 
the  underlying  motivation  of  a speech  act  is  a major  issue. 

Another  issue  is  understanding  how  humans  process  language  - 
both  in  forming  output  and  in  interpreting  input. 

It  also  appears  that  knowledge-based  inference  is  essential 
to  natural  language  understanding,  as  language  just  provides 
abreviated  cues  that  must  be  fleshed  out  using  models  and  expectations 
resident  in  the  receiver.  Finally,  we  do  not  even  have  a good 
handle  on  what  it  means  to  understand  language  and  what  is  the 
relation  between  language  and  perception. 

2 . Linguistics 

A major  issue  in  NLP  is  how  to  disambiguate  words  to  determine 
their  appropriate  sense  in  the  current  context.  A complementary 
problem  is  dealing  with  novel  language  such  as  metaphors,  idioms, 
similes  and  analogies. 

Syntactic  ambiguity  is  a common  source  of  trouble  in  natural 
language  processing.  Where  to  attach  modifying  clauses  is  one 
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problem.  However  even  handling  adverbial  modifiers  has  proved 
difficult. 


Another  major  issue  is  pragmatics  - the  study  of  language  In 
context.  Arden  (1980,  p474)  notes: 


Many  of  the  issues  discussed  under  frame  systems  are  pertinent 
to  pragmatic  Issues.  The  prototypes  stored  In  a frame 
system  can  Include  both  the  prototypes  for  the  domain  being 
discussed  and  those  related  to  the  conversational  situation. 

In  a travel -planning  system,  then,  a user  responds  to  the 
question,  "What  time  do  you  want  to  leave?"  with  the  answer: 

"I  have  to  be  at  a meeting  by  11."  In  planning  an  appropriate 
flight,  the  system  makes  asumptlons  about  the  relevance 
of  the  answer  to  the  question. 

This  aspect  of  language  is  one  that  Is  just  beginning  to 
be  dealt  with  In  current  systems.  Although  most  large 
systems  in  the  past  had  specialized  ways  of  dealing  with 
a subset  of  pragmatic  problems,  there  is  as  yet  no  theoretical 
approach.  As  people  look  to  interactive  systems  for  teaching 
and  explanation,  however,  it  seems  likely  that  this  will 
be  the  major  focus  of  research  in  the  1980's. 


3 . Conversat ion 

In  the  area  of  everyday  conversation,  the  real  world  is 
extensive,  complex,  largely  unknown  and  unknowable.  This  is 
quite  different  from  the  closed  world  of  many  of  the  research 
NLP  systems. 


"A  major  problem  for  NLP  systems  is  following  the  dialogue  context 
and  being  able  to  ascertain  the  references  of  noun  phrases  by 
taking  context  into  account."  (Hendrix  and  Sacerdoti,  1981,  p. 

330) 
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Another  major  problem  is  understanding  the  motivation  of  the 
participants  in  the  discourse  in  order  to  penetrate  their  remarks. 

As  conversational  natural -language  communication  between  individuals 
is  dependent  on  what  the  participants  know  about  each  other's 
knowledge,  beliefs,  plans,  and  goals,  methods  for  developing 
and  incorporating  this  knowledge  into  a computer  is  a major 
i ssue . 

4 . Processor  Design 

"While  many  specific  problems  are  linguistic,...  many  important 
problems  are  actually  general  AI  problems  of  representation 
and  process  organization."  (Arden,  1980,  p.  409) 

A major  issue  in  the  design  of  a NLP  system  is  choosing  the 
tradeoffs  between  capability,  efficiency  and  simplicity.  Also 
at  issue  are  the  language  constructs  to  be  handled,  generality, 
processing  time  and  costs.  The  choice  of  the  overall  architecture 
of  the  system  and  the  grammar  to  be  used  is  a major  design  decision 
for  which  there  are  as  yet  no  general  criteria. 

Though  all  natural -language  processing  systems  contain  some 

sort  of  parser,  the  practical  design  of  applications  of  grammar 

to  NLP  has  proved  difficult.  The  design  of  the  parser  in  both 

theory  and  implementation  is  a complex  problem.  Also  at  issue 

is  the  top-down  (ATN-like)  approach  to  parsing  versus  bottom- 

up  and  combined  approaches.  In  addition,  how  best  to  utilize 

knowledge  sources  (phonemic,  lexical,  syntactic,  semantic,  etc.) 
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In  designing  a parser  and  a system  architecture  remains  a major 
Issue . 

A problem  with  the  ATN  parser  approach,  with  its  heavy  dependence 
on  syntax,  Is  how  can  it  be  adapted  to  handle  ungrammatical  Inputs. 
Though  considerable  progress  has  been  made,  there  Is  as  yet 
no  clear  solution.  INTELLECT  (a  commercial  ATN-based  system) 
handles  ungrammatical  constructions  by  relaxing  syntactic  constraints. 
IBM's  Epistle  System  (Jensen  and  Heidorn,  1983)  use  a fitting 
procedure  to  ungrammatical  inputs  to  produce  a reasonable  approximate 
parse.  Semantic  grammars  and  expectation-driven  systems  have 
an  advantage  in  overcoming  ungrammatical  inputs. 

Another  major  Issue  is:  Is  It  appropriate  to  keep  the  semantic 
analysis  separate  from  the  syntactic  analysis,  or  should  the  two 
work  Interactively?  (see  Charniak,  1981) 

Also,  is  it  necessary  in  NL  translating  or  understanding  to 
utilize  an  intermediate  representation,  or  can  the  final  inter- 
pretation be  gotten  at  more  directly?  If  an  intermediate  represen- 
tation is  to  be  used,  which  one  is  best?  What  is  the  appropriate 
role  of  primitive  concepts  (such  as  found  in  case  systems  or 
conceptual  dependency)  in  natural  language  processing? 

How  can  we  make  restricted  natural  language  more  pal itable  to  humans? 

A major  problem  is  the  negative  expectations  created  in  the 
mind  of  a naive  user,  when  a system  doesn't  understand  an  input 
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sentence.  Naive  users  have  difficulty  distinguishing  between 
the  limitations  in  a system's  conceptual  coverage  and  the  system's 
linguistic  coverage.  A related  problem  is  the  system  returning 
a null  answer.  This  may  mislead  the  user  as  a answer  may  be 
null  for  many  reasons.  Another  problem  is  insuring  a sufficiently 
rapid  response  to  user  inputs. 

One  common  problem  with  real  systems  is  stonewalling  behavior  - 
the  system  not  responding  to  what  the  user  is  really  after  (the 
user's  goal)  because  the  user  hasn't  suitably  worded  the  input. 

Some  of  the  important  problems  and  issues  have  to  do  with  knowledge 
representation : 

-Which  knowledge  representation  is  appropriate  for  a given  problem? 
-How  to  represent  such  things  as  space,  time,  events,  human 
behavior,  emotions,  physical  mechanisms  and  many  processes  associated 
with  novel  language? 

-How  can  common  sense  and  plausibility  judgement  (is  this  meaning 
possible?)  be  represented? 

-How  should  items  in  memory  be  indexed  and  accessed? 

-How  should  context  be  represented? 

-How  should  memory  be  updated? 

-How  to  deal  with  inconsistencies? 

-How  can  we  make  the  representations  more  precise? 

-How  can  we  make  the  system  learn  from  experience  so  as  to  build 
up  the  necessary  large  knowledge  base  needed  to  deal  with  the 
real  world? 
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-How  can  we  build  useful  Internal  representations  that  correspond 
to  3D  models,  from  Information  provided  by  natural  language? 

NLP  usually  takes  the  sentence  as  the  basic  unit  to  be  analyzed. 
Assigning  purpose  and  meaning  to  larger  units  has  proved  difficult. 
The  NRL  Conceptual  Linguistics  Workshop  (1981)  concluded  that 
"Concept  extraction  was  the  most  difficult  task  examined  at 
the  workshop.  Success  depends  on  the  adequacy  of  the  situation- 
context  representation  and  the  development  of  more  sophisticated 
models  of  language  use." 


NLP  has  always  pushed  the  limits  of  computer  capability.  Thus 
a current  problem  is  designing  special  computer  architectures 
and  processors  for  NLP. 


5 . Data  Base  Interfaces 

Hendrix  and  Sacerdoti  (1981,  pp  318,350)  point  out  two  problems 
particularly  associated  with  data  base  interfaces: 

(1) .  The  need  to  understand  context  throws  considerable 
doubt  on  the  idea  of  building  natural -language  interfaces 
to  systems  with  knowledge  bases  independent  of  the 
language  processing  system  itself. 

(2) .  One  of  the  practical  problems  currently  limiting 
the  use  of  NLP  systems  for  accessing  data  bases  is 
the  lack  of  trained  people  and  good  support  tools 

for  creating  the  knowledge  structures  needed  for  each 
new  data  base. 
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6 . Text  Understanding 


Text  understanding  systems  have  encountered  problems  in  achieving 
practicality,  both  in  terms  of  extending  the  knowledge  of  the 
language  and  In  providing  a sufficiently  broad  base  of  world 
knowledge.  The  NRL  Conceptual  Linguistics  Workshop  (1981) 
concluded  that  "Current  systems  for  extracting  information 
from  military  messages  use  the  key  word  and  key  phrase  methods 
which  are  incapable  of  providing  adequate  semantic  representation. 

In  the  immediate  future,  more  general  methods  for  concept  extraction 
probably  will  work  well  only  in  well  defined  subfields  that  are 
carefully  selected  and  painstakingly  modeled." 

SRI  and  the  National  Library  of  Medicine  have  text  understanding 
systems  in  the  research  stage.  SRI  handcodes  logic  formulas 
that  describe  the  content  of  a paragraph.  Queries  are  matched 
against  these  paragraph  descriptions. 
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M.  Research  Required 

Current  research  in  natural  language  processing  systems  includes 
machine  translation,  information  retrieval  and  interactive  inter- 
faces to  computer  systems.  Important  supporting  research  topics 
are  language  and  text  analysis,  user  modeling,  domain  modeling, 
task  modeling,  discourse  modeling,  reasoning  and  knowledge  repre- 
sentation. 

Much  of  the  research  required  (as  well  as  the  research  now 
underway)  is  centered  around  addressing  the  problems  and  issues 
discussed  in  the  following  areas: 

1 . How  People  Use  Language 

The  psychological  mechanisms  underlying  human  language  production 
is  a fertile  field  for  investigation.  Efforts  are  needed  to  build 
explicit  computational  models  to  help  explain  why  human  languages 
are  the  way  they  are  and  the  role  they  play  in  human  perception. 


2 . L i n q u i s i t i c s 

Further  research  is  needed  on  methods  for  disambiguating  language 
and  for  the  utilization  of  context  in  language  understanding. 

3 . Conversation 

Additional  work  is  needed  on  ways  to  represent  the  huge  amount  of 
knowledge  needed  for  Natural  Language  Understanding  (NLU). 
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A great  deal  of  research  Is  needed  to  give  NLU  systems  the  ability 
to  understand  not  only  what  is  actually  said,  but  the  underlying 
intention  as  well. 

Research  is  now  underway  by  many  groups  on  explicitly  modeling 
goals,  intentions  and  planning  abilities  of  people.  Investigation 
of  script  and  frame-based  systems  is  currently  the  most  active 
NLP  A I research  area. 

4 . Processor  Design 

Architectures,  grammars,  parsing  techniques  and  internal  repre- 
sentations needed  for  NLP  systems  remain  important  research 
areas . 

One  particularly  fertile  area  is  how  to  best  utilize  semantics 
to  guide  the  path  of  the  syntactic  parser.  Charniak  (1981, p 
1085)  indicates  that  a relatively  unexplored  area  requiring 
research  is  the  interaction  between  the  processes  of  language 
comprehension  and  the  form  of  semantic  representation  used. 

Further  work  is  needed  on  bringing  multiple  knowledge  sources 
(KS's:  syntactic,  semantic,  pragmatic  and  contextual)  to  bear 
on  understanding  a natural  language  utterance,  but  still  keeping 
the  KS's  separate  for  easy  updating  and  modification.  Also 
needed  is  further  work  in  AI  problem-solving  to  cope  with  the 
problem  of  finding  an  appropriate  structure  in  the  huge  space 
of  possible  meanings  of  a natural  language  input. 
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Improved  NLU  techniques  are  needed  to  handle  complex  notions 
such  as  disjunction,  quantification,  implication,  causality 
and  possibility.  Also  needed  are  better  methods  for  handling 
"open  worlds,"  where  all  things  needed  to  understand  the  world 
are  not  in  the  system's  knowledge  base. 

Further  research  Is  also  necessary  to  aid  with  a common  source 
of  trouble  In  NLP,  that  is,  dealing  with  syntactic  and  semantic 
ambiguities  and  how  to  handle  metaphors  and  idioms. 

Finally,  the  problems  of  efficiency,  speed,  portability,  etc., 
discussed  in  the  previous  chapter,  all  are  in  need  of  better  solutions. 

5 . Data  Base  Interfaces 

A current  research  topic  is  how  can  data  base  schemas  best  be 
enriched  to  support  a natural  language  interface,  and  what  would 
be  the  best  logical  structure  for  a particular  data  base. 

Research  is  also  needed  on  more  efficient  methods  for  compiling 
a vocabulary  for  a particular  application. 

6 . Text  Understanding 

Seeking  general  methods  of  concept  extraction  remains  as  one 
of  the  major  research  areas  in  text  understanding. 

/ 
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N.  Principal  U.S.  Participants  in  NLP 
1 . Research  and  Development* 
Non-Profit 


SRI 

MITRE 

Universities 


Yale  U.  - Dept  of  Computer  Science 

U.  of  CA,  Berkeley  - Computer  Science  Div.,  Dept  of  EECS. 
Carnegie-Mel Ion  U.  - Dept  of  Computer  Science. 

U.  of  Illinois,  Urbana  - Coordinated  Science  Lab. 

Brown  U.  - Dept  of  Computer  Science 
Stanford  U.  - Computer  Science  Dept. 

U.  of  Rochester  - Computer  Science  Dept. 

U.  of  Mass,  Amherst  - Department  of  Computer  and  Information  Science 
SUNY,  Stoneybrook,  Dept  of  Computer  Science 
U.  of  CA,  Irvine,  Computer  Science  Dept. 

U of  PA  - Dept  of  Computer  and  Infor.  Science 

GA  Institute  of  Technology  - School  of  Infor.  and  Computer  Science 
USC  - Infor.  Science  Institute. 

MIT  - A I Lab. 

NYU  - Computer  Science  Dept,  and  Linguistic  String  Project 
U.  of  Texas  at  Austin  - Dept  of  Computer  Science 
Cal . Inst . of  Tech . 

Brigham  Young  U.  - Linguistics  Dept. 

Duke  U.  - Dept  of  Computer  Science 
N Carolina  State  - Dept,  of  Computer  Science 
Oregon  State  U - Dept  of  Computer  Science 

Industrial 


BBN 

TRW  Defense  Systems 

IBM,  Yorktown  Heights,  N.Y. 

Burroughs 
Sperry  Univac 

Systems  Development  Corp,  Santa  Monica 

Hewlett  Packard 

Martin  Marietta,  Denver 

Texas  Instruments,  Dallas 

Xerox  PARC 

Bell  Labs 

Institute  for  Scientific  Information,  Phila.,  PA 
GM  Research  labs,  Warren,  MI 
Honeywe 1 1 


*A  review  of  current  research  in  NLP  is  given  in  Kaplan  (1982). 
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2 . Principle  U.S.  Government  Agencies  Funding  NLP  Research 

ONR  (Office  of  Naval  Research) 

NSF  [National  Science  Foundation) 

DARPA  (Defense  Advanced  Projects  Agency) 

3 . Commercial  NLP  Systems 

Artificial  Intelligence  Corp.  Waltham,  Mass. 

Cognitive  Systems  Inc.,  New  Haven,  Conn. 

Symantec,  Sunnyvale,  CA. 

Texas  Instruments,  Dallas,  TX . 

Weldner  Communications,  Inc.,  Provo,  Utah 
Savvy  Marketing  International,  Sunnyvale,  CA 
ALPS,  Provo,  UT 

4.  Non-U. S. 


U.  of  Manchester,  England 
Kyoto  U.,  Japan 
Siemens  Corp.  Germany 
U of  Strathclyde,  Scotland 

Centre  National  de  la  Recherche  Sclent  1 flque . , Paris 

U.  dl  Udine,  Italy 

U.  of  Cambridge,  England 

Philips  Res.  Labs,  The  Netherlands 
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0.  Forecast 


Commercial  natural  language  interfaces  (NLI's)  to  computer  programs 
and  data  base  management  systems  are  now  becoming  available. 

The  imminent  advent  of  NLI's  for  micro-computers  is  the  precursor 
for  eventually  making  it  possible  for  virtually  anyone  to  have 
direct  access  to  powerful  computational  systems. 

As  the  cost  of  computing  has  continued  to  fall,  but  the  cost 
of  programming  hasn't,  it  has  already  become  cheaper  in  some 
applications  to  create  NLI  systems  (that  utilize  subsets  of 
English)  than  to  train  people  in  formal  programming  languages. 

Computational  linguists  and  workers  in  related  fields  are  devoting 
considerable  attention  to  the  problems  of  NLP  systems  that  understand 
the  goals  and  beliefs  of  the  individual  communicators.  Though 
progress  has  been  made,  and  feasibility  has  been  demonstrated, 
more  than  a decade  will  be  required  before  useful  systems  with 
these  capabilities  will  become  available. 

One  of  the  problems,  in  implementing  new  installations  of  NLP 
systems,  is  gathering  information  about  the  applicable  vocabulary 
and  the  logical  structure  of  the  associated  data  bases.  Work 
is  now  underway  to  develop  tools  to  help  automate  this  task. 

Such  tools  should  be  available  within  5 years. 

For  text  understanding,  experimental  programs  have  been  developed 
that  "skim"  stylized  text  such  as  short  disaster  stories  in 
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newspapers  (DeJong,  1982).  Despite  the  practical  problems 
of  sufficient  world  knowledge  and  the  extension  of  language 
knowledge  required,  practical  tools  emerging  from  these  efforts 
should  be  available  to  provide  assistance  to  humans  doing  text 
understanding  within  this  decade. 


The  NRL  Computational  Linguistic  Workshop  (1981)  concluded  that 
text  generation  techniques  are  maturing  rapidly  and  new  application 
possibilities  will  appear  within  the  next  five  years. 


The  NRL  workshop  also  Indicated  that: 

Machine  aids  for  human  translators  appear  to  have  a brighter 
prospect  for  Immediate  application  than  fully  automatic 
translation;  however,  the  Canadian  French -Eng  1 1 sh  weather 
bulletin  project  Is  a fully  automatic  system  In  which  only 
20%  of  the  translated  sentences  require  minor  rewording 
before  public  release.  An  ambitious  common  market  project 
involving  machine  translation  among  six  European  languages 
is  scheduled  to  begin  shortly.  Sixty  people  will  be  Involved 
in  that  undertaking  which  will  be  one  of  the  largest  projects 
undertaken  in  computational  linguistics.*  The  panel  was 
divided  in  its  forecast  on  the  five  year  perspective  of 
machine  translation  but  the  majority  were  very  optimistic. 


Nippon  Telegram  and  Telephone  Corp  in  Tokyo  has  a machine  translation 

AI  project  underway.  An  experimental  system  for  translating 

from  Japanese  to  English  and  visa  versa  is  now  being  demonstrated. 

In  addition,  the  recently  initiated  Japanese  Fifth  Generation 
Computer  effort  has  computer-based  natural  language  understanding 
as  one  of  its  major  goals. 

*EUR0TA  - A machine  translation  project  sponsored  by  the  European 
Common  Market  - 8 countries,  over  15  universities,  $24  M over 
several  years. 
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In  summary,  natural  language  interfaces  using  a limited  subset 
of  English  are  now  becoming  available.  Hundreds  of  specialized 
systems  are  already  in  operation.  Major  efforts  in  text  under- 
standing and  machine  translation  are  underway,  and  useful  (though 
limited)  systems  will  be  available  within  the  next  five  years. 

Systems  that  are  heavily  knowledge-based  and  handle  more  complete 
sets  of  English  should  be  available  within  this  decade.  However, 
systems  that  can  handle  unrestricted  natural  discourse  and  under- 
stand the  motivation  of  the  communicators  remain  a distant  goal, 
probably  requiring  more  than  a decade  before  useful  systems 
appear . 

As  natural  language  interfaces  coupled  to  intelligent  computer 
programs  become  widespread,  major  changes  in  our  society  are 
likely  to  result.  There  is  a trend  now  to  replace  relatively 
unskilled  white  collar  and  factory  work  with  trained  computer 
personnel  operating  computer-based  systems.  However,  with  the  advent 
of  friendly  interfaces  (and  eventually  even  speech  understanding 
systems  and  automatic  text  generation  from  speech)  relatively 
unskilled  personnel  will  be  able  to  control  complex  machines, 
operations,  and  computer  programs.  As  this  occurs,  even  relatively 
skilled  factory  and  white  collar  work  may  be  taken  over  by  these 
lesser  skilled  personnel  with  their  computer  aids  - the  experts 
and  computer  personnel  moving  on  to  develop  new  programs  and 
applications. 
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The  outcome  of  such  a revolution  cannot  be  fully  predicted  at 
this  time,  other  then  to  suggest  that  much  of  the  power  of  the 
computer  age  will  become  available  to  everyone,  requiring  a 
rethinking  of  our  national  goals  and  life  styles. 
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P . Further  Sources  of  Information 

1 . Journals 

o American  Journal  of  Computational  Linguistics  - published 

by  the  major  society  in  NLP,  the  Association  for  Computational 
Linguistics  (ACL) . 

o SIGART  Newsletter  - ACM  (Association  for  Computing  Machinery), 
o Artificial  Intelligence 

o Cognitive  Science  - Cognitive  Science  Society 
o AI  Magazine  - American  Association  for  AI  (AAAI) 

o Pattern  Analysis  and  Machine  Intelligence  - IEEE 

o International  Journal  of  Man  Machine  Interactions 

2 . Conferences 

o Computational  Linguistics  (COLING)  - held  biannually. 

Next  one  is  in  July  1984  at  Stanford  University. 

o International  Joint  Conference  on  AI  (IJCAI)  - biannual. 
Current  one  in  Germany,  August  1983. 

o ACL  Annual  Conference. 

o AAAI  annual  conferences. 

o ACM  conferences. 

o IEEE  Systems,  Man  & Cybernetics  Annual  Conferences. 

o Conference  on  Applied  Natural  Language  Processing. 

Sponsored  jointly  by  ACL  & NRL  - Feb.  1983  in  Santa 
Monica,  CA. 

3 . Recent  Books 

o Winograd,  T.,  Language  as  a Cognitive  Process,  Vol  I, 

Syntax , Reading!  Mass : Addison  Lesley,  1983. 

o Lehnert,  W.G.  and  Ringle,  M.H.  (eds.).  Strategies  for  Natural 
Language  Processing,  Hillsdale,  N.J.  Lawrence  Erlbaum,  1982. 

o Sager,  N.,  Natural  Language  Information  Processing,  Reading 

Mass:  Addison- Wesley, 1981 
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o Tennant,  H.,  Natural  Lanquaqe  Processinq,  New  York:  Petrocelll. 
1981. 

o Brady,  M.,  Computational  Approaches  to  Discourse,  Cambridge, 
Mass:  MIT  Press,  1982. 

o Joshl,  A.K.,  Weber,  B.L.  and  Sag,  I. A.  (eds),  Elements 

of  Discourse  Understanding,  Cambridge:  Cambridge  University 

Press,  1981. 

o L.  Bole  (ed.),  Natural  Lanquaqe  Communication  with  Computers, 
Berlin:  Spr i nger - Ver 1 ag , i98i. 

o L.  Bole  (ed.),  Data  Base  Question  Answering  Systems,  Berlin: 

Spr i nger-Ver 1 ag  , 1982  . 

o Schank,  R.C.  and  Riesbeck,  C.K.,  Inside  Computer  Understanding. 
Hillsdale,  N.J.:  Lawrence  Erlbaum,  lyai. 


4 . Overviews  and  Surveys 

o Barr,  A and  Feigenbaum,  E.A.,  Chapter  IV,  "Understanding 

Natural  Language,"  The  Handbook  of  Artificial  Intelligence, 

Vo  1 I , Los  Altos,  CAl  W.  Kaufmann  1981,  pp  223-322  . 

o S.J.  Kaplan,  "Special  Section  - Natural  Language,"  SIGART 
Newsletter,  No.  79,  Jan  1982,  pp  27-109. 

o Charnlak,  E.,  "Six  Topics  in  Search  of  A Parser:  An  Overview 

of  A I Language  Research,:  I JCAI-81 , pp  1079-1087. 

o Waltz,  D.L.,  "The  State  of  the  Art  in  Natural  Language 

Understanding,"  In  Strategies  for  Natural  Lanquaqe  Processing, 
W.G.  Lehnert  and  M.H.  Ringle  (eds),  Hillsdale,  N.J.:  Lawrence 
Erlbaum,  1982,  pp.  3-32. 

o Slocum,  J.,  "A  Practical  Comparison  of  Parsing  Strategies 

for  Machine  Translation  and  other  Natural  Language  Processing 
Purposes,"  Tech.  Report  NL-41,  Dept  of  C.S.,  U.  of  Texas, 

Aug  1981. 

o Hendrix,  G.  G.  and  Sacerdoti,  E.  D.,  "Natural -Language 

Processing:  The  Field  in  Perspective,"  Byte , Sept.  1981, 

pp  304-352. 
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Glossary 


Anaphora ; The  repetition  of  a word  or  phrase  at  the  beginning 
successive  statements,  questions,  etc. 

C . A . I . : Computer-Aided  Instruction 

Case : A semantically  relevant  syntactic  relationship. 

Case  Frame;  An  ordered  set  of  cases  for  each  v§rb  form. 

Case  Grammar:  A form  of  Transformational  Grammar  In  which  the  deep 

structure  Is  based  on  cases. 

Computational  Linguistics:  The  study  of  processing  language  with 

a computer 

Conceptual  Dependency  (CD):  An  approach,  related  to  case  frames, 

1 n wh 1 c h sentences  are  translated  Into  basic  concepts  expressed 
In  a small  set  of  semantic  primitives. 

DB : Data  Base 

DBMS:  Data  Base  Management  System. 

Deep  Structure:  The  underlying  formal  canonical  syntactic 

structure , associated  with  a sentence,  that  Indicates  the 
sense  of  the  verbs  and  Includes  subjects  and  objects  that 
may  be  Implied  but  are  missing  from  the  original  sentence. 

Discourse : Conversation,  or  exchange  of  Ideas. 

Domain : Subject  area  of  the  communication. 

Frame : A data  structure  for  grouping  Information  on  a whole  situation, 

complex  object,  or  series  of  events. 

Grammar : A scheme  for  specifying  the  sentences  allowed  in 

a language,  Indicating  the  syntactic  rules  for  combining 
words  Into  well-formed  phrases  and  clauses. 

Heuristic : Rule  of  thumb  or  empirical  knowledge  used  to  help 

guide  a solution. 

KB : Knowledge  Base 

Lex  1 con : A vocabulary  or  list  of  words  relating  to  a particular 

subject  or  activity. 

Li ngu 1 sties:  The  scientific  study  of  language. 
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Morphology : The  arrangement  and  interrelationship  of  morphemes 

in  words. 

Morpheme : The  smallest  meaningful  unit  of  a language,  whether 

a word,  base  or  affix. 

Network  Representation:  A data  structure  consisting  of  nodes 

and  labeled  connecting  arcs. 

NL : Natural  Language 

NLI : Natural  Language  Interface 

NLP : Natural  Language  Processing. 

NLU : Natural  Language  Understanding 

Parse  Tree:  A tree-like  data  structure  of  a sentence,  resulting 

from  syntactic  analysis,  that  shows  the  grammatical 
relationships  of  the  words  in  the  sentence. 

Parsing ; Processing  an  input  sentence  to  produce  a more  useful 
representation . 

Phonemes : The  fundamental  speech  sounds  of  a language. 

Phrase  Structure  Grammar;  Also  referred  to  as  Context  Free  Grammar. 
Type  2 of  a series  of  grammars  defined  by  Chomsky.  A Relatively 
natural  grammar,  it  has  been  one  of  the  most  useful  in  natural- 
language  processing. 

Pragmatics : The  study  of  the  use  of  language  context. 

Script : A frame-like  data  structure  for  representing  stereotyped 

seguences  of  events  to  aid  in  understanding  simple 

stories . 

Semantic  Grammar:  A grammar  for  a limited  domain  that,  instead 

of  using  conventional  syntactic  constituents  such  as  noun 
phrases,  uses  meaningful  components  appropriate  to  the  domain. 


Semantics : The  study  of  meaning. 

Sense : Meaning. 

Surface  Structure:  A parse  tree  obtained  by  appling  syntactic 

analysis  to  a sentence. 

Syntax : The  study  of  arranging  words  in  phrases  and  sentences. 
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Temp  Tate : A prototype  model  or  structure  that  can  be  used  for 

sentence  1 nterpretat 1 on . 

Tense : A form  of  a verb  that  relates  It  to  time. 

Transformational  Grammar;  A phrase  structure  grammar  that 
Incorporates  transformational  rules  to  obtain  the  deep  structure 
from  the  surface  structure. 
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